Caio Raphael

Odin examples - SIMD .
core:simd .
len .
- For #simd vector: the number of elements in v .
The code idea seems to require that all elements be grouped together in a SIMD Vector ( #simd[N]T ) and so intrinsics operations are performed.
Simd vectors are comparable.
Casting:
- raw_data(^#simd[$N]$E) -> [^]E
Implicit conversion:
- T -> #simd[N]T
For Matrices:
- Column-major is used in order to utilize (SIMD) vector instructions effectively on modern hardware, if possible.
- Unlike normal arrays, matrices try to maximize alignment to allow for the (SIMD) vectorization properties whilst keeping zero padding (either between columns (assuming default layout) or at the end of the type).
- Zero padding is a compromise for use with third-party libraries, instead of optimizing for performance. Padding between columns (assuming default layout) was not taken even if that would have allowed each column to be loaded individually into a SIMD register with the correct alignment properties.

From the Odin examples:
The best value for this will depend on the available SIMD instructions, and possibly the hardware itself.
On amd64, the default target only uses SSE4, which has 128-bit SIMD registers. This means that #simd[4]f32 would be the native vector size on that target--but that doesn't always give the
fastest results, as larger vectors allow for better instruction-level parallelism. For larger vectors, LLVM will automatically spread the data over multiple SIMD registers, but if the SIMD vector is too large this starts to become a detriment to performance. Try comparing the results between 4, 8, and 16.
You will likely see a performance boost, particularly in the f64 case, by enabling AVX, and around 97% of PCs support that (according to the October 2024 Steam hardware survey). You can enable that by building with -target-features:avx (or -microarch:x86-64-v3 , for a number of CPU features available on many modern systems). However, note that doing so will cause the program to crash on systems that don't support these features!

SIMD registers often have an alignment of 16/32/64 bytes.
SIMD vectors are sensitive to alignment.
Using an unaligned load is simple and has a negligible cost on most modern systems.